Abstract
Background: Clinical management for individuals with sickle cell disease (SCD) varies by genotype, but accurate genotyping remains dependent on labor-intensive expert review of hemoglobin electrophoresis or high-performance liquid chromatography (HPLC) data. Furthermore, expert review has variable accuracy, with electronic medical record (EMR) reported genotypes showing discrepancy rates as high as 71.9%. This challenge is exacerbated by acute transfusions, which confound hemoglobin fractions and increase misclassification risk. Existing machine learning (ML) attempts to address this problem have shown promise but often lack features needed for clinical adoption like accounting for transfusions, plausibility checks, and interpretability. To overcome these challenges, we have developed a system that uses routine clinical data to predict an individual's hemoglobin genotype and was trained on genotypes confirmed by whole genome sequencing (WGS) to ensure accuracy.
Methods: A cohort of 794 adult SCD individuals was retrospectively assembled with genotypes for HbSS, HbSC, HbSβ⁰, and HbSβ⁺ confirmed by WGS, alongside hemoglobin variants identified by capillary electrophoresis. Individuals with HbAS (sickle cell trait, n=298) were identified solely by electrophoresis and clinical records. 234 (~30%) of SCD individuals had documented transfusions within 30 days of testing. Clinical data were analyzed using our Sickle Cell Hemoglobin Analysis via Rules and Prediction (SHARP) pipeline with a two-tier approach. Tier 1 applied clinical rules to assign high-confidence calls for five genotypes (HbSS, HbSC, HbSβ⁰, HbSβ⁺, HbAS) while attenuating confidence in transfusion-confounded patterns and identifying and excluding non-SCD phenotypes. Tier 2 utilized an XGBoost classifier with 10-fold stratified cross-validation and hyperparameter tuning. Inputs included hemoglobin fractions, complete blood count (CBC) parameters, transfusion status, and derived ratios between these factors. A post-hoc constraint module enforced fundamental hemoglobin biochemistry, penalizing HbSC predictions when HbC<10%, boosting HbAS probability for classic trait ratios, and adjusting probabilities for HbSS and HbSβ variants based on HbA and HbA₂ thresholds, followed by probability renormalization. SHapley Additive exPlanations (SHAP) was used to visualize feature contributions for each prediction. Performance of SHARP was evaluated under a coverage-based framework (automated cases at ≥90% confidence, accuracy on those cases, and expert-review rate) alongside macro-averaged F₁, AUROC, calibration error, and traditional accuracy for comparison. The final pipeline was adapted into a user-friendly application for making individual genotype predictions as well as a suite of tools for genotyping larger datasets. Results: In 10-fold cross-validation of 794 cases, the full SHARP pipeline automated 95.6% (760/794) of cases with 98.8% accuracy; the remaining 4.4% (35/794) were also given predictions but were flagged for expert review due to lower confidence. 49.9% (396/794) of individuals were classified with clinical rules, while the remaining 50.1% (398/794) relied on ML. The final accuracies by genotype for the full pipeline were 95.9% for HbSS, 97.0% for HbSC, 82.4% for HbSβ⁰, 94.3% for HbSβ⁺, and 100% for HbAS. The model's macro-averaged F₁ score from cross-validation was 0.866, and the macro-averaged AUROC was 0.982. Transfused samples showed 95.3% accuracy versus 98.0% in non-transfused samples.
Discussion: SHARP is a first-of-its-kind comprehensive SCD genotyping tool with the potential to reduce the burden of expert review in genotyping while increasing accuracy, especially for rarer genotypes like HbSβ⁰, which currently has approximately 50% accuracy with expert hematology review due to lack of discrimination between HbSβ⁰ and HbSS with alpha thalassemia coinheritance, and weak diagnostic capacity of HbA2. Beyond real-time diagnostic support, SHARP also enables retrospective analysis of existing datasets without genotypes or for validation of manually assigned genotypes in the EMR. Furthermore, the integration of SHAP explanations provides full transparency for how each individual prediction was made. Moving forward, we will perform extensive validation and testing to increase generalizability, obtain more training data to further increase accuracy, particularly for HbSβ⁰, and support integration with existing laboratory systems.